Replacing eligibility trace for action-value learning with function approximation
نویسنده
چکیده
The eligibility trace is one of the most used mechanisms to speed up reinforcement learning. Earlier reported experiments seem to indicate that replacing eligibility traces would perform better than accumulating eligibility traces. However, replacing traces are currently not applicable when using function approximation methods where states are not represented uniquely by binary values. This paper proposes two modifications to replacing traces that overcome this limitation. Experimental results from the Mountain-Car task indicate that the new replacing traces outperform both the accumulating and the ‘ordinary’ replacing traces.
منابع مشابه
An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function
We present an analysis of actor/critic algorithms, in which the actor updates its policy using eligibility traces of the policy parameters. Most of the theoretical results for eligibility traces have been for only critic's value iteration algorithms. This paper investigates what the actor's eligibility trace does. The results show that the algorithm is an extension of Williams' REINFORCE algori...
متن کاملExperimental analysis of eligibility traces strategies in temporal difference learning
Temporal difference (TD) learning is a model-free reinforcement learning technique, which adopts an infinite horizon discount model and uses an incremental learning technique for dynamic programming. The state value function is updated in terms of sample episodes. Utilising eligibility traces is a key mechanism in enhancing the rate of convergence. TD(λ) represents the use of eligibility traces...
متن کاملReinforcement Learning with Replacing Eligibility
The eligibility trace is one of the basic mechanisms used in reinforcement learning to handle delayed reward. In this paper we introduce a new kind of eligibility trace, the replacing trace, analyze it theoretically, and show that it results in faster, more reliable learning than the conventional trace. Both kinds of trace assign credit to prior events according to how recently they occurred, b...
متن کاملEfficient Asymptotic Approximation in Temporal Difference Learning
in Temporal Difference Learning Frédérick Garcia and Florent Serre Abstract. TD( ) is an algorithm that learns the value function associated to a policy in a Markov Decision Process (MDP). We propose in this paper an asymptotic approximation of online TD( ) with accumulating eligibility trace, called ATD( ). We then use the Ordinary Differential Equation (ODE) method to analyse ATD( ) and to op...
متن کاملA Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning
Recently, a new multi-step temporal learning algorithm, called Q(σ), unifies n-step Tree-Backup (when σ = 0) and n-step Sarsa (when σ = 1) by introducing a sampling parameter σ. However, similar to other multi-step temporal-difference learning algorithms, Q(σ) needs much memory consumption and computation time. Eligibility trace is an important mechanism to transform the off-line updates into e...
متن کامل